NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF

Cen, S; Mei, J; Goshvadi, K; Dai, H; Yang, T; Yang, S; Schuurmans, D; Chi, Y; Dai, B (April 2025, The Thirteenth International Conference on Learning Representations)

Free, publicly-accessible full text available April 24, 2026
Data-Adaptive Discriminative Feature Localization with Statistically Guaranteed Interpretation

Dai, B.; Shen, X.; Li, C.; Chen, C.; Pan, W. (October 2023, Annals of applied statistics)

critical to reveal a blackbox model’s decision-making process from raw data to prediction. In this article, we use two real datasets, the MNIST handwritten digits and MIT-BIH Electrocardiogram (ECG) signals, to motivate key characteristics of discriminative features, namely adaptiveness, predictive importance and effectiveness. Then, we develop a localization framework based on adversarial attacks to effectively localize discriminative features. In contrast to existing heuristic methods, we also provide a statistically guaranteed interpretability of the localized features by measuring a generalized partial R2. We apply the proposed method to the MNIST dataset and the MIT-BIH dataset with a convolutional auto-encoder. In the first, the compact image regions localized by the proposed method are visually appealing. Similarly, in the second, the identified ECG features are biologically plausible and consistent with cardiac electrophysiological principles while locating subtle anomalies in a QRS complex that may not be discernible by the naked eye. Overall, the proposed method compares favorably with state-of-the-art competitors. Accompanying this paper is a Python library dnn-locate that implements the proposed approach.
more » « less
Full Text Available
Oracle Inequalities for Model Selection in Offline Reinforcement Learning

Lee, J.; Tucker, G.; Nachum, O.; Dai, B.; Brunskill, E. (January 2022, Advances in neural information processing systems)

In offline reinforcement learning (RL), a learner leverages prior logged data to learn a good policy without interacting with the environment. A major challenge in applying such methods in practice is the lack of both theoretically principled and practical tools for model selection and evaluation. To address this, we study the problem of model selection in offline RL with value function approximation. The learner is given a nested sequence of model classes to minimize squared Bellman error and must select among these to achieve a balance between approximation and estimation error of the classes. We propose the first model selection algorithm for offline RL that achieves minimax rate-optimal oracle inequalities up to logarithmic factors. The algorithm, MODBE, takes as input a collection of candidate model classes and a generic base offline RL algorithm. By successively eliminating model classes using a novel one-sided generalization test, MODBE returns a policy with regret scaling with the complexity of the minimally complete model class. In addition to its theoretical guarantees, it is conceptually simple and computationally efficient, amounting to solving a series of square loss regression problems and then comparing relative square loss between classes. We conclude with several numerical simulations showing it is capable of reliably selecting a good model class.
more » « less
Full Text Available
Combiner: Full Attention Transformer with Sparse Computation Cost

https://doi.org/10.48550/arXiv.2107.05768

Ren, H; Dai, H; Dai, Z; Yang, M; Leskovec, J; Schuurmans, D; Dai, B (December 2021, Advances in neural information processing systems)

Transformers provide a class of expressive architectures that are extremely effective for sequence modeling. However, the key limitation of transformers is their quadratic memory and time complexity O(L2) with respect to the sequence length in attention layers, which restricts application in extremely long sequences. Most existing approaches leverage sparsity or low-rank assumptions in the attention matrix to reduce cost, but sacrifice expressiveness. Instead, we propose Combiner, which provides full attention capability in each attention head while maintaining low computation and memory complexity. The key idea is to treat the self-attention mechanism as a conditional expectation over embeddings at each location, and approximate the conditional distribution with a structured factorization. Each location can attend to all other locations, either via direct attention, or through indirect attention to abstractions, which are again conditional expectations of embeddings from corresponding local regions. We show that most sparse attention patterns used in existing sparse transformers are able to inspire the design of such factorization for full attention, resulting in the same sub-quadratic cost (O(L log(L)) or O(L√L)). Combiner is a drop-in replacement for attention layers in existing transformers and can be easily implemented in common frameworks. An experimental evaluation on both autoregressive and bidirectional sequence tasks demonstrates the effectiveness of this approach, yielding state-of-the-art results on several image and text modeling tasks.
more » « less
Full Text Available
Smooth neighborhood recommender systems.

Dai, B.; Wang, J.; Shen, X.; Qu, A. (February 2019, Journal of machine learning research)

Recommender systems predict users’ preferences over a large number of items by pooling similar information from other users and/or items in the presence of sparse observations. One major challenge is how to utilize user-item specific covariates and networks describing user-item interactions in a high-dimensional situation, for accurate personalized prediction. In this article, we propose a smooth neighborhood recommender in the framework of the latent factor models. A similarity kernel is utilized to borrow neighborhood information from continuous covariates over a user-item specific network, such as a user’s social network, where the grouping information defined by discrete covariates is also integrated through the network. Consequently, user-item specific information is built into the recommender to battle the ‘cold-start” issue in the absence of observations in collaborative and content- based filtering. Moreover, we utilize a “divide-and-conquer” version of the alternating least squares algorithm to achieve scalable computation, and establish asymptotic results for the proposed method, demonstrating that it achieves superior prediction accuracy. Finally, we illustrate that the proposed method improves substantially over its competitors in simulated examples and real benchmark data–Last.fm music data.
more » « less
Full Text Available
Predictive Approximate Bayesian Computation via Saddle Points

Yang, Y.; Dai, B.; Kiyavash, N.; He, N. (January 2018, Proceedings of Machine Learning Research)

Full Text Available
Direct observation of the exotic $β - γ - α$ decay mode in the $T_{z} = - 1$ nucleus ${}^{20}{Na}$

https://doi.org/10.1103/PhysRevC.103.L011301

Wang, Y. B.; Su, J.; Han, Z. Y.; Tang, B.; Cui, B. Q.; Ge, T.; Lyu, Y. L.; Brown, B. A.; Yuan, C. X.; Chen, L. H.; et al (January 2021, Physical Review C)
null (Ed.)
Full Text Available

Search for: All records